1 Asymptotic Bayes Analysis for the Finite Horizon One Armed Bandit Problem

نویسندگان

  • Apostolos N. Burnetas
  • Michael N. Katehakis
چکیده

The multi-armed bandit probem is often taken as a basic model for the tradeoff between the exploration utilization required for efficient optimization under uncertainty. In this paper we study the situation in which the unknown performance of a new bandit is to be evaluated and compared with that of a known one over a finite horizon. We assume that the bandits represent random variables with distributions from the one parameter exponential family. When the objective is to maximize the Bayes expected sum of outcomes over a finite horizon, it is shown that optimal policies tend to simple limits when the length of the horizon is large.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

UDC 519.244.3 An Asymptotic Minimax Theorem for Gaussian Two-Armed Bandit

The asymptotic minimax theorem for Bernoulli two-armed bandit problem states that minimax risk has the order N as N → ∞, where N is the control horizon, and provides the estimates of the factor. For Gaussian twoarmed bandit with unit variances of one-step incomes and close expectations, we improve the asymptotic minimax theorem as follows: the minimax risk is approximately equal to 0.637N as N ...

متن کامل

Contributions to the Asymptotic Minimax Theorem for the Two-Armed Bandit Problem

The asymptotic minimax theorem for Bernoully twoarmed bandit problem states that the minimax risk has the order N as N → ∞, where N is the control horizon, and provides lower and upper estimates. It can be easily extended to normal two-armed bandit. For normal two-armed bandit, we generalize the asymptotic minimax theorem as follows: the minimax risk is approximately equal to 0.637N as N →∞. Ke...

متن کامل

Computing a Classic Index for Finite-Horizon Bandits

T paper considers the efficient exact computation of the counterpart of the Gittins index for a finitehorizon discrete-state bandit, which measures for each initial state the average productivity, given by the maximum ratio of expected total discounted reward earned to expected total discounted time expended that can be achieved through a number of successive plays stopping by the given horizon...

متن کامل

Nearly Optimal Exploration-Exploitation Decision Thresholds

While in general trading off exploration and exploitation in reinforcement learning is hard, under some formulations relatively simple solutions exist. Optimal decision thresholds for the multi-armed bandit problem, one for the infinite horizon discounted reward case and one for the finite horizon undiscounted reward case are derived, which make the link between the reward horizon, uncertainty ...

متن کامل

A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences

We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of Burnetas and Katehakis (1996). Our contribution is to provide a finite-time analysis of this algorithm; we get bounds whose main terms are smaller than the ones of pre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016